STANFORD
UNIVERSITY PRESS
  



Close Reading with Computers
Textual Scholarship, Computational Formalism, and David Mitchell's Cloud Atlas
Martin Paul Eve

BUY THIS BOOK


Chapter 2

READING GENRE COMPUTATIONALLY

> Genre is as tricky to define as is the “archive.” It is a cliché to note that nobody seems able to agree precisely on what we mean by genre. Is it, as Jacques Derrida once characteristically teased, the case that “genres are not to be mixed” and that “as soon as genre announces itself, one must respect a norm, one must not cross a line of demarcation, one must not risk impurity, anomaly, or monstrosity”?1 Are genres part of rule-following practices conducted by writers, in which various conventions are internalized and reproduced according to shared Wittgensteinian communal undertakings?2 Does the study of genre succumb to Robert Stam’s critiques of extension, normativism, homogeneic definitions, and biologism?3 Are our analyses of genre always ones of cyclicality and endless regression, seeking an origin for a practice that is defined by sorting into already-existent categories?4 (Yet, if so: whence these categories?) Do texts that play with genre in metafictional ways ask us to consider the structure of genre, as they once asked us to consider the study of history?5 Or is genre actually something more akin to what Lauren Berlant proposes in Cruel Optimism (2011), where she writes of conceiving of a moment from within that moment itself as “a temporal genre whose conventions emerge from the personal and public filtering of the situations and events that are happening in an extended now whose very parameters . . . are also always there for debate,” an emerging social arrangement that provides “an affective expectation of the experience of watching something unfold”?6

Many questions and no consensus. For scholarly writing about genre is a genre of its own, particularly within fields such as speculative/science fiction into which I have not here even delved. I have written, thus far in this book, of Mitchell’s multigenericity in Cloud Atlas and the ways in which theme and language begin to intersect to create a generic tapestry, all without defining genre. I do not and cannot here define genre in an adequate way to address its decades of study. We can, though, ask a smaller-scale question: what does it mean to “write as David Mitchell does in Cloud Atlas”? What distinctive traits might computational microscopy identify within Mitchell’s writing and style that are invisible to the naked eye? What could such an analysis tell us about genre? As Ted Underwood has put it, computational methods can turn the lack of definition of genre to their advantage: “We can dispense with fixed definitions, and base the study of genre only on the shifting practices of particular historical actors—but still produce models of genre substantive enough to compare and contrast.”7

One of the most basic things that we can do with computational techniques is to conduct an analysis of the most frequently used words in a text. That doesn’t sound very exciting on its own, and such an approach has been the subject of critique and even ridicule, but it turns out that the subconscious ways in which authors use seemingly insignificant words is an extremely effective marker for authorship attribution.8 That is, most texts by the same author can be accurately clustered by comparing the distance between word frequencies within each work (on which, more below). I wondered, though, what would happen if I undertook such an analysis on each section of Mitchell’s novel. Would the underlying—and presumed subconscious—elements of language change between sections? Or would we, in fact, end up with Mitchell’s persona inscribed within these texts? A set of stylometric techniques can help us to answer some of these questions.

As the name implies, computational stylometry is the use of computers to measure (-metry) the stylistic properties of texts (stylo-). Stylometry, as a quantifying activity, has a long and varied history, from legal court cases where the accused was acquitted on the basis of stylometric evidence, such as that of Steve Raymond, through to authorship attribution.9 In the latter case, as charted by Anthony Kenny, the discipline dates back to approximately 1851, when Augustus de Morgan suggested that a dispute over the attribution of certain biblical epistles could be settled by measuring average word lengths and correlating them with known writings of St. Paul.10 From this humble beginning we are now at the point where it is claimed that computational forensic stylometry can “identify individuals in sets of 50 authors with better than 90% accuracy, and [can] even [be] scaled to more than 100,000 authors.”11

In terms of a background to stylometry, a significant breakthrough, or at least a key moment of success, took place around 1964 with the publication of Mosteller and Wallace’s work on the set of pseudonymously published Federalist papers of 1787–88, which were pushing for the adoption of the proposed Constitution for the United States. Mosteller and Wallace analyzed the distribution of thirty function words (articles, pronouns, etc.) throughout the Federalist papers and managed to come to the same conclusion of authorship as other historians, based in this case on statistically inferred probabilities and Bayesian analysis.12 As Patrick Juola frames it, there are several reasons why this corpus formed an important test-bed for stylometry: “First, the documents themselves are widely available . . . , including over the Internet through sources such as Project Gutenberg. Second, the candidate set for authorship is well-defined; the author of the disputed papers is known to be either Hamilton or Madison. Third, the undisputed papers provide excellent samples of undisputed text written by the same authors, at the same time, on the same topic, in the same genre, for publication via the same media.” In Juola’s words, “a more representative training set would be hard to imagine.”13

If the Federalist papers represent a significant success for stylometric authorship attribution, there have also been some disastrous failures. In the early 1990s a series of criminal court cases turned to forensic stylometry to identify authorship of documents (for example, Thomas McCrossen’s appeal in London in July of 1991; the prosecution of Frank Beck in Leicester in 1992; the Dublin trial of Vincent Connell in December of 1991; Nicky Kelly’s pardon by the Irish government in April of 1992; the case of Joseph Nelson-Wilson in London in 1992; and the Carl Bridgewater murder case).14 Indeed, it is frequently the case that court trials turn on the authorship of specific documents, be they suicide notes, sent emails, or written letters.15 These specific cases, however, all relied on a particular technique known as “qsum” or “cusum”—for “cumulative sum” of the deviations from the mean—which is designed to measure the stability of a measured feature of a text.16 The only problem here was that, almost immediately, the cusum technique came under intense scrutiny and theoretical criticism, ending in a live-television broadcast failure of an authorship attribution test using this method.17 Despite this failing, specific stylometric techniques remain available as evidence in courts of law depending on their academic credibility and the jurisdiction’s specific laws on admissibility.18

The other most well-known case of failure in the field of stylometry occurred in the late 1990s, when Don Foster attributed the poem “A Funeral Elegy” to William Shakespeare using a raft of stylometric approaches.19 The attendant press coverage landed this claim on the front page of the New York Times, and the community of traditional Shakespeare scholars reacted in disbelief. When Foster refused to accept traditional historicist arguments against his attribution, stylometric work by multiple groups of scholars pointed to the seventeenth-century playwright and poet John Ford as the far more likely author of the poem, which Foster eventually accepted.20 While, as Juola points out, “this cut-and-thrust debate can be regarded as a good (if somewhat bitter) result of the standard scholarly process of criticism,” for many scholars it marked the sole interaction that they have ever had with stylometry, and the result could only be a perception of notoriety, braggadocio, and inaccuracy.21

That said, in recent years there have also been some extremely successful algorithmic developments for detecting authorship. Perhaps the best known of these is the 1992 “Burrows’s delta.”22 With apologies for a brief mathematical explanation over the next page or so, Burrows’s delta (the word here meaning the mathematical symbol for “difference”: Δ) consists of two steps to conduct a multivariate statistical authorship attribution. First of all, one measures the most frequent words that occur in a text and then relativizes these using a “z-score” measure. A z-score measurement is basically asking, “By how much does a word’s frequency differ from the average deviation of the other words?” The first thing that we would calculate here is the “standard deviation” of the entire word set. A standard deviation means the square root of the average of the squared deviations of the values from the average. Or, in other words: work out the average frequency with which words occur in a text; then work out (for each word) how many more or less times that word occurs relative to the average; square this and add up all such deviations; then divide this by the number of words; then square root the result. To get the z-score, we next take an individual word’s frequency, subtract the average (mean) frequency, and divide this result by the standard deviation of the whole set. This is conventionally written as score (X) minus mean (mu: μ) divided by sigma (standard deviation: σ):

Once we have a ranked series of z-scores for each term, the second operation in Burrows’s delta is to calculate the difference between the words in both texts. This means taking the z-score of, say, the word the in text A and subtracting the z-score of the word the in text B. Once we have done this for every word that we wish to take into account, we add all of these differences together, a move that is the mathematical equivalent of taking the “Manhattan distance” (named because it moves across the multidimensional grid in right-angled blocks like the streets in the borough of Manhattan, rather than going “as the crow flies”) between the multidimensional space plots of these terms.23 That is, if you plot each of the word frequencies on a multidimensional graph, with one axis for each text and one for frequency, the Manhattan distance is the route you have to take, in 90 degree turns, to get from the term in one text to the same term in the other. In Burrows’s delta, the smaller this total addition of differences is, the more likely it is that the two texts were written by the same author.

Burrows’s delta has been seen as a successful algorithm for many years, as validated in several studies.24 It is, mathematically speaking, relatively easy to calculate and seems to produce good results. But it is not entirely known why the delta method is so good at clustering texts written by the same author, although recent work has suggested that such a “text distance measure is particularly successful in authorship attribution if emphasizing structural differences of author style profiles without being too much influenced by actual amplitudes,” as does Burrows’s delta.25

Burrows’s delta is also a somewhat outdated way of thinking in computational terms for authorship attribution. As of 2019, if one wanted to classify a text as written by one author or another, one would usually construct a model of authors and texts using machine learning methods for identification rather than using a mathematical algorithmic process.26 This would typically involve profiling a range of features and balancing them against one another within the model that one creates.27 This is the type of “model thinking” toward which Caroline Levine has recently gestured: ways of thinking that are compatible with humanistic scholarly practice but that move “across scales and media.”28

But Burrows himself was always cautious about what he was doing. When writing of “authorial fingerprints,” for example, he noted that “we do not yet have either proof or promise” of the “very existence” of such a phenomenon.29 Burrows also points out that, “not unexpectedly,” his method “works least well with texts of a genre uncharacteristic of their author and, in one case, with texts far separated in time across a long literary career.”30 So why use the delta method at all? Why not use a better, newer machine learning approach to text classification? In this chapter I am not actually interested in identifying authorship. We know from Chapter 1 that with the exception of Ebershoff’s edits to the Sonmi~451 chapter in E, David Mitchell is the author of all the diverging segments of Cloud Atlas. A machine learning approach might confirm this or get it wrong. But machine learning approaches are also notoriously difficult to inspect. The reasons why a machine learning algorithm has made a specific classification are hard to determine. By contrast, I seek to examine the different linguistic properties of texts written in a variety of linguistic genres by the same author; that is, I wish to look at the process of classification rather than the end result. Algorithmic failure, in such cases, becomes intensely productive as it reveals the fault lines of difference within a text. Burrows’s delta is a much better method for this type of work. It is an algorithm with a strong track record, backed by mathematics that can be understood by humans, even when operationalized, unlike many newer unsupervised or partly supervised machine learning approaches such as topic modeling, word embedding, or sentiment analysis.31 This trajectory also brings us to a point where it is worth delving deeper into the underlying assumptions of many stylometric methods.

ASSUMPTIONS ABOUT WRITING STYLE

There are a number of supposed premises on which most stylometric methods rest, and these pertain to their uses as means of identifying authorship. Before moving to work on Cloud Atlas, I want briefly to cover these since they bear more broadly on how we conceive of literary style. These assumptions are (1) that authors have a “stylistic naturalism,” (2) that stylometry measures subconsciously inscribed features of a text, and (3) that authorship is the underlying textual feature that can be ascertained by the study of quantified formal aesthetics.

The first of these assumptions, that there is a “stylistic naturalism” to an author’s works, is premised on the idea that most of us, when writing, do not consider how our works will be read by computers. As Brennan and Greenstadt put it: “In many historical matters, authorship has been unintentially [sic] lost to time and it can be assumed that the authors did not have the knowledge or inclination to attempt to hide their linguistic style. However, this may not be the case for modern authors who wish to hide their identity.”32 Language is a tool of communication among people, designed to convey or cause specific effects or affects. The stylistic features of texts are usually considered a contributor to the overarching impact of the communication. Indeed, the scansion and rhythm of a work of prose, for instance, is an important feature of well-written texts, the three-part list being a good example of this in persuasive rhetoric. Yet the selection and prioritization of specific stylistic features (rhythm, cadence, word length, repetition) has consequential effects on the other elements of language that are deployed.

In other words, and to put it bluntly: there are hundreds of stylistic traits of texts that we can measure and determine. It is not possible for an author to hold all of these in working memory while writing; instead, authors write for intended readerly outcomes. The presumption that an imagined reader will react in various ways to one’s writing is, or at least should be, the overarching concern when writing. It is this that leads to an idea of what I call a stylistic naturalism: the conceit that authors write in ways that are somehow blind to the processes of the measurement of stylometry.

I would instead seek to couch this slightly differently. Any good author is aware that his or her writing is to be “measured”—so to speak—by a reader. But there is a constant play of balance at work here. In prioritizing one set of measurements—for instance, one could notice as a reader the long, rambling sentences of David Foster Wallace’s Infinite Jest (1998)—others must inevitably be ignored. Authors are aware that they are being measured; they just must choose which measures are of most significance for their literary purposes. This is a type of “natural” writing, then, that can only be called natural in that it is social and not individual. Anticipated readerly reactions condition the writing process. As Juola puts it, on the one hand, “the assumption of most researchers . . . is that people have a characteristic pattern of language use, a sort of ‘authorial fingerprint’ that can be detected in their writings. . . . On the other hand, there are also good practical reasons to believe that such fingerprints may be very complex, certainly more complex than simple univariate statistics such as average word length or vocabulary size.”33

A subassumption underlying the “stylistic naturalism” claim is that authors behave in the same way when writing their various works—or, at least, that stylometric profiles do not substantially change even if authors deliberately try to alter their own styles. This also assumes that authors’ own styles do not change naturally with time—a contentious claim.34 Indeed, in 2014 Ariel Stolerman and colleagues identified shifting stylometric profiles of authors as a key failing in traditional “closed-world” settings.35 (What Stolerman et al. mean by “closed-world” here is that there is a known list of probable authors, and a computational classifier is trained to correctly attribute unknown works based on known stylometric profiles rather than an environment where any author should be grouped apart from all others.) Yet what happens, in stylometric terms, when an author such as Sarah Waters moves from a neo-Victorian mode to writing about the Second World War? What happens when Hilary Mantel writes about Margaret Thatcher, as opposed to the Tudor setting of Wolf Hall (2009)? What happens when Sarah Hall shifts from the feminist utopian genre of The Carhullan Army (2007) to the more naturalistic and contemporaneous setting of The Wolf Border (2015)?

These questions bring us to the obverse, but somehow linked, counterpart of the assumption that there might be a stylistic naturalism—that is, that stylometry can measure subconsciously inscribed elements of texts. As David Holmes puts it, at the heart of stylometry “lies an assumption that authors have an unconscious aspect to their style, an aspect which cannot consciously be manipulated but which possesses features which are quantifiable and which may be distinctive.”36 This is a different type of stylistic naturalism claim, one that, instead of asserting that authors are behaving in ways that make them unaware of stylometric profiling, looks to an author’s subconscious as a site of unchangeable linguistic practice. Indeed, Freudian psychoanalysis has long held that aspects of communication and language harbor revelations about a person over which that person has little or no control.

That said, as I will show shortly, all but one of the different narrative sections of Cloud Atlas E can be distinguished from one another through the relative frequencies of the terms the, a, I, to, of, and in. Yet who among us, when writing, is conscious of the relative frequency with which we ourselves use these terms? These seemingly unimportant articles, pronouns, and prepositions are used when we need them, not usually as a conscious stylistic choice. In other words, the internalized stylistic profile of our individual communications usually determines how, why, and how frequently these terms are used; they are thought to be beyond our control. Such features are, therefore, conceived of as subconsciously inscribed elements of a text that it is difficult for an author to modify, even if he or she knows that stylometric profiling will be conducted on that text. As I will go on to show, David Mitchell’s novel, in its genre play, does manipulate such features.

All of this brings me to the final assumption that I identify in most work on stylometry—namely, that authorship is the underlying textual feature that can be ascertained by the study of quantified formal aesthetics. Of course, there are lengthy poststructuralist debates about what authorship actually means for the reception of texts.37 There are also disputes in labor and publishing studies about how the individual work of “authorship” is prioritized above all others, when actually there are many forms of labor without which publishing would not be possible: typesetting/text encoding, copyediting, proofreading, programming, graphical design, format creation, digital preservation, platform maintenance, forward-migration of content, security design, marketing, social media promotion, implementation of semantic machine-readability, licensing and legal protocols, and the list goes on. The first challenge here for stylometry is to understand what impact these polyvalent labor practices have in the crafting of a single, authorial profile. As above, we know that David Ebershoff requested substantial line edits to the US edition of Cloud Atlas. What sense does it then make to say that the figure identified as “David Mitchell” would correlate to a stylistic profile of this text? At best, if the stylometry is working correctly as an attribution system centered on the author, it would identify this text as a harmonized fusion of Mitchell and Ebershoff.

The challenge that I actually want to pose to these three straw-figures that I have drawn up against many stylometric practices is one foreshadowed by Matt Jockers and others at the Stanford Literary Lab, namely that the author-signal is often neither the sole nor the most important signal that we can detect through stylometry.38 Indeed, the first pamphlet of the Stanford Literary Lab found that, while the pull of the author-signal was strong and seemed to outweigh other signals, various quantitative signatures also corresponded to those features that we might call “genre.”39 Instead, especially in the case of Mitchell’s rich and varied novel, one version of which was heavily edited by another person, and which deliberately employs mimicry and pastiche to achieve its proliferation of stylistic effects, it might be more appropriate to consider the genre signals that a text emits.

Notes

1. Jacques Derrida, “The Law of Genre,” trans. Avital Ronell, Critical Inquiry 7, no. 1 (1980): 55, 57.

2. For more on this and the associated literature see Ludwig Wittgenstein, Remarks on the Foundations of Mathematics, 3rd ed. (Oxford: Blackwell, 1978); Saul Kripke, Wittgenstein on Rules and Private Language (Oxford: Blackwell, 1982); Crispin Wright, “Wittgenstein’s Rule-Following Considerations and the Central Project of Theoretical Linguistics,” in Reflections on Chomsky, ed. Alexander L. George (Oxford: Blackwell, 1989), 233–64; G. P. Baker and P. M. S. Hacker, Wittgenstein: Rules, Grammar and Necessity, vol. 2, An Analytical Commentary on the Philosophical Investigations (Oxford: Blackwell, 1985).

3. Robert Stam, “Text and Intertext: Introduction,” in Film Theory: An Anthology, ed. Robert Stam and Toby Miller (Malden, MA: Blackwell, 2000), 151–52; Martin Paul Eve, Literature Against Criticism: University English and Contemporary Fiction in Conflict (Cambridge: Open Book Publishers, 2016), 166.

4. Andrew Tudor, Theories of Film (London: British Film Institute, 1974), 135.

5. Martin Paul Eve, “New Rhetorics: Disciplinarity and the Movement from Historiography to Taxonomography,” in Metahistorical Narratives and Scientific Metafictions, ed. Giuseppe di Episcopo (Naples: Edizioni Cronopio, 2015), 101–22.

6. Lauren Berlant, Cruel Optimism (Durham, NC: Duke University Press, 2011), 4–7.

7. Ted Underwood, “The Life Cycles of Genres,” Journal of Cultural Analytics, May 23, 2016, https://doi.org/10.22148/16.005.

8. For the critique see Brennan, “The Digital-Humanities Bust,” Chronicle of Higher Education, Oct. 15, 2017, www.chronicle.com/article/The-Digital-Humanities-Bust/241424.

9. See A. Q. Morton’s widely discredited Literary Detection: How to Prove Authorship and Fraud in Literature and Documents (Epping: Bowker, 1978), 205–6; but also Patrick Juola, “Stylometry and Immigration: A Case Study,” Journal of Law and Policy 21, no. 2 (2012): 287–98.

10. Anthony Kenny, The Computation of Style: An Introduction to Statistics for Students of Literature and Humanities (Oxford: Pergamon, 1982), 1.

11. Ariel Stolerman et al., “Breaking the Closed-World Assumption in Stylometric Authorship Attribution,” in Advances in Digital Forensics X, ed. Gilbert Peterson and Sujeet Shenoi (Berlin: Springer, 2014), 186.

12. F. Mosteller and D. L. Wallace, Inference and Disputed Authorship: The Federalist (Reading, MA: Addison-Wesley, 1964).

13. Patrick Juola, “Authorship Attribution,” Foundations and Trends® in Information Retrieval 1, no. 3 (2007): 242–43, https://doi.org/10.1561/1500000005.

14. David I. Holmes, “The Evolution of Stylometry in Humanities Scholarship,” Literary and Linguistic Computing 13, no. 3 (1998): 114; Juola, “Authorship Attribution,” 243.

15. C. E. Chaski, “Who’s at the Keyboard: Authorship Attribution in Digital Evidence Investigations,” International Journal of Digital Evidence 4, no. 1 (2005): 0–13, www.utica.edu/academic/institutes/ecii/publications/articles/B49F9C4A-0362-765C-6A235CB8ABDFACFF.pdf.

16. J. M. Farringdon, Analyzing for Authorship: A Guide to the Cusum Technique (Cardiff: University of Wales Press, 1996).

17. David Canter, “An Evaluation of ‘Cusum’ Stylistic Analysis of Confessions,” Expert Evidence 1, no. 2 (1992): 93–99; R. A. Hardcastle, “Forensic Linguistics: An Assessment of the CUSUM Method for the Determination of Authorship,” Journal of the Forensic Science Society 33, no. 2 (1993): 95–106, https://doi.org/10.1016/S0015-7368(93)72987-4; R. A. Hardcastle, “CUSUM: A Credible Method for the Determination of Authorship?” Science and Justice 37, no. 2 (1997): 129–38, https://doi.org/10.1016/S1355-0306(97)72158-0; M. L. Hilton, “An Assessment of Cumulative Sum Charts for Authorship Attribution,” Literary and Linguistic Computing 8, no. 2 (1993): 73–80, https://doi.org/10.1093/llc/8.2.73; David I. Holmes and Fiona Tweedie, “Forensic Stylometry: A Review of the Cusum Controversy,” La Revue informatique et statistique dans les sciences humaines 31, nos. 1–4 (1995): 19–47; Juola, “Authorship Attribution,” 233–34.

18. C. E. Chaski, “The Keyboard Dilemma and Forensic Authorship Attribution,” Advances in Digital Forensics 3 (2007); G. McMenamin, “Disputed Authorship in US Law,” International Journal of Speech, Language and the Law 11, no. 1 (2004): 73–82.

19. J. W. Grieve, “Quantitative Authorship Attribution: A History and an Evaluation of Techniques” (master’s thesis, Simon Fraser University, 2005), http://summit.sfu.ca/item/8840.

20. W. Elliot and R. J. Valenza, “And Then There Were None: Winnowing the Shakespeare Claimants,” Computers and the Humanities 30 (1996): 191–245; W. Elliot and R. J. Valenza, “The Professor Doth Protest Too Much, Methinks,” Computers and the Humanities 32 (1998): 425–90; W. Elliot and R. J. Valenza, “So Many Hardballs, so Few over the Plate,” Computers and the Humanities 36, no. 4 (2002): 455–60.

21. Juola, “Authorship Attribution,” 245.

22. John Burrows, “‘Delta’: A Measure of Stylistic Difference and a Guide to Likely Authorship,” Literary and Linguistic Computing 17, no. 3 (2002): 267–87, https://doi.org/10.1093/llc/17.3.267.

23. S. Argamon, “Interpreting Burrows’s Delta: Geometric and Probabilistic Foundations,” Literary and Linguistic Computing 23, no. 2 (2007): 131–47, https://doi.org/10.1093/llc/fqn003.

24. David Hoover, “Testing Burrows’s Delta,” Literary and Linguistic Computing 19, no. 4 (2004): 453–75; J. Rybicki and M. Eder, “Deeper Delta Across Genres and Languages: Do We Really Need the Most Frequent Words?” Literary and Linguistic Computing 26, no. 3 (2011): 315–21, https://doi.org/10.1093/llc/fqr031.

25. Stefan Evert et al., “Outliers or Key Profiles? Understanding Distance Measures for Authorship Attribution,” in Digital Humanities 2016: Conference Abstracts (Digital Humanities 2016, Jagiellonian University and Pedagogical University, Kraków), http://dh2016.adho.org/abstracts/253.

26. See, e.g., Hoshiladevi Ramnial, Shireen Panchoo, and Sameerchand Pudaruth, “Authorship Attribution Using Stylometry and Machine Learning Techniques,” in Intelligent Systems Technologies and Applications (Cham, CH: Springer, 2016), 113–25, https://doi.org/10.1007/978-3-319-23036-8_10.

27. For more on computational modeling and loss see Richard Jean So, “All Models Are Wrong,” PMLA 132, no. 3 (2017): 668–73.

28. Caroline Levine, “Model Thinking: Generalization, Political Form, and the Common Good,” New Literary History 48, no. 4 (2017): 644, https://doi.org/10.1353/nlh.2017.0033.

29. Burrows, “‘Delta,’” 268.

30. Burrows, 267.

31. My thanks to one of my anonymous readers for this point.

32. Michael Robert Brennan and Rachel Greenstadt, “Practical Attacks Against Authorship Recognition Techniques,” in Proceedings of the Twenty-First Conference on Innovative Applications of Artificial Intelligence (Palo Alto, CA: AAAI, 2009), www.cs.drexel.edu/~greenie/brennan_paper.pdf.

33. Juola, “Authorship Attribution,” 239.

34. See Edward W. Said, On Late Style (London: Bloomsbury, 2006).

35. Stolerman et al., “Breaking the Closed-World Assumption.”

36. Holmes, “The Evolution of Stylometry,” 111.

37. Roland Barthes, “The Death of the Author,” in Image, Music, Text, trans. Stephen Heath (London: Fontana, 1987), 142–48; Michel Foucault, “What Is an Author?” in The Essential Works of Michel Foucault, 1954–1984, 3 vols. (London: Penguin, 2000), 2:205–22; Seán Burke, The Death and Return of the Author: Criticism and Subjectivity in Barthes, Foucault and Derrida, 3rd rev. ed. (Edinburgh: Edinburgh University Press, 2008).

38. Matthew L. Jockers, Macroanalysis: Digital Methods and Literary History (Urbana: University of Illinois Press, 2013).

39. Sarah Allison et al., Quantitative Formalism: An Experiment,” Stanford Literary Lab (pamphlet), 2011, https://litlab.stanford.edu/LiteraryLabPamphlet1.pdf.